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study.  This  report  does  not  promulgate  official  Air 
Force  policies  or  procedures.  The  technical  conclusions 
are  solely  those  of  the  author. 


ABSTRACT 


Air  Force  Personnel  Managers  must  be  able  to  accurately  forecast 
the  force  size.  This  need  is  explicit  in  meeting  statutory 
budget  limitations.  Further  officer  losses  drive  accession , 
training,  and  promotion,  thus  the  need  for  accuracy  in  fore- 
casting losses  cannot  be  over  emphasized.  To  accomplish  this 
objective  loss  rates  have  been  generated  using  Ordinary  Least 
Squares  (OLS)  stepwise  regression.  The  objective  of  this 
paper  is  to  expose  the  relative  efficacies  of  alternative 
methods  which  could  be  used  viz.  Maximum  Likelihood  Estimation 
(MLE)  and  OLS  standardized  coefficient  (Beta)  predictor  models. _ 
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PART  I.  INTRODUCTION 


Background 

The  Directorate  of  Personnel  Data  Systems  of  the  Air  Force 
Military  Personnel  Center  is  the  prime  agency  for  providing 
officer  loss  rates  for  all  personnel  management  actions 
within  HQ  USAF.  Currently  this  objective  is  satisfied  by  using 
OLS  stepwise  regression  and  an  OLS  derivative  technique  called 
Odds  for  Effectiveness  (OFE) . It  is  presumed  that  readers 
of  this  paper  are  thoroughly  feuniliar  with;  (1)  The  need  for 
and  various  modeling/simulation  applications  of  loss  rates, 
and  (2)  The  fundamentals  and  applications  of  OLS.  Therefore 
the  objective  of  this  paper  will  be  to  compare  candidate 
methods  to  produce  viable  loss  rates,  and  not  a tutorial  on 
theories  as  to  why  or  how  loss  rates  may  be  generated. 


A project  was  initiated  late  in  1976  to  evaluate  candidate 
methods  for  producing  reliable  officer  loss  rates.  Since 
much  has  been  written  of  late  concerning  MLE  (See  Nerlove 
and  Press  #1,  Dempsey  and  Fast  #2,  Lockman  & Warner  #3),  and 
these  authors  as  well  as  others  contend  that  MLE  is  more 
reliable  for  predicting  the  dichotomous  dependent  variable, 
the  MLE  method  was  the  top  contender  to  replace  OLS  . . . MLE 
is  theoretically  more  stable  over  time  (i.e.  the  concomitant 
data  shift) . In  search  of  alternative  methods  which  may  also 
achieve  time  stability,  the  standardized  coefficient  (Beta) 
model  of  OLS  was  also  investigated.  Four  cases  were  examined: 


1.  Contrived  data  with  controlled  data  shifts. 

2.  74-»75  Colonels'  Retirements,  Line  officers. 

3.  75->76  Colonels'  Retirements,  Line  officers. 

4.  Fourth  year  group  Non-Rated  Line  Separations  75-^76. 


CASE  1:  CONTRIVED  DATA  WITH  CONTROLLED  SHIFTS 


Three  primary  objectives  were  sought  in  this  case;  (1)  Known 
and  controlled  data,  (2)  A small  case,  and  (3)  Controlled 
data  shifts.  To  meet  these  objectives  the  following  test 
was  built: 

N=«50 

Dependent  Variable:  Off  {0)  70% 

On  (1)  30% 


Variable  #1 
and  the  two 

was  distributed  as 
x-validation  files 

below  in  the 

sample  test  file 

ON/OBS* 

TEST  FILE 
VALUE 

X-VAL  FILE  #1 
VALUE 

X-VAL  FILE 
VALUE 

1/10 

1 

0 

0 

2/10 

8 

9 

5 

3/10 

27 

22 

22 

4/10 

64- 

72 

60 

5/10 

125 

150 

110 

Variables  2 and  3 were  held  constant  in  all  three  files  as 
below; 


VAR  2 

0 

VAR  3 

1 

Variable  four  was  distributed  as  follows  in  the  three  files: 


ON/OBS 

TEST  FILE 
VALUE 

X-VAI.  FILE  #1 
VALUE 

X-VAL  FILE 
VALUE 

1/10 

1 

3 

0 

3/10 

2 

5 

1 

7/10 

3 

6 

2 

3/10 

4 

7 

2 

1/10 

5 

9 

3 

CASE  2:  COLONELS'  RETIREMENT  74-^75,  LINE  OFFICERS 

This  case  used  historical  data  from  the  Air  Force  Military 
Personnel  System  with  an  objective  of  building  a "weak" 
predictor  model.  The  following  six  attributes  were  chosen 
from  the  '74  data  to  meet  this  objective; 


*Ten  observations  have  a value  of  1 in  the  sample  test 
file.  One  of  these  observations  is  "ON". 
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1.  Total  Active  Federal  Commissioned  Service  (TAFCS) 

2.  Age  (years) 

3.  Officer  Effectiveness  Report,  Weighted  Mean 

4.  Officer  Effectiveness  Report,  Current 

5.  Number  Permanent  Passovers 

6.  Below  the  Zone  Selection  to  Colonel  (1/0) 

CASE  3:  COLONELS'  RETIREMENT  75-».76,  LINE  OFFICERS 

A more  comprehensive  analysis  was  done  in  this  case  to 
produce  a "strong"  predictor  model.  Historical  data  were 
analyzed  for  colonels'  retirements  in  1973,  1974  and  1975 
to  select  the  attributes  and  attribute  values  which  would  be 
used.  In  this  analysis  three  conditions  were  imposed;  (1) 
Consistency,  (2)  Discrimination,  and  (3)  Representative 
incTiitibency . Use  of  these  constraints  provided  13  attributes 
which  were  then  transformed  to  provide  yet  a "stronger" 
model,  or  at  least  one  with  a higher  r-squared.  Stepwise 
regression  results  subsequently  reduced  the  number  of  attributes 
used  to  six: 

1.  Total  Active  Federal  Commissioned  Service  (TAFCS) 

2.  Number  of  Dependents 

3.  Source  of  Commission 

4.  Permanent  Grade 

5.  Duty  Air  Force  Specialty  Code 

6.  Officer  Effectiveness  Report,  Weighted  Mean 

CASE  4:  FOURTH  YEAR  GROUP  NON-RATED  LINE  SEPARATIONS  7 5-3^7 6 

Statistical  phenomena  which  occiar  at  or  near  .50  should  be 
more  difficult  to  model.  Intuitively  one  expects  the  potential 
discriminators  also  to  be  distributed  at  or  near  a fifty-fifty 
split  just  as  the  dependent  variable.  Fourth  year  separations 
approach  fifty  percent  for  non-rated  line  officers  as  this 
point  is  the  end  of  obligated  service  for  all  except  Air 
Force  Academy  accessions.  Just  as  in  the  colonels'  cases  the 
constraints  of  discrimination  and  representative  incumbency 
were  applied. . .consistency  over  time  was  not.  Transforming 
the  data  and  applying  stepwise  regression  reduced  the  predictor 
attributes  to  the  following  nine: 

1.  Permanent  Grade 

2.  Source  of  Commission 

3.  Officer  Effectiveness  Report,  Current 

4 . Service  Component 

5.  DOB  (year) 

6.  Major  Air  Command  Assigned 

7.  Officer  Effectiveness  Report,  Weighted  Mean 

8 . Race 

9.  Academic  Specialty 


METHOD 


In  each  of  the  foijr  casesprediction  equations  were  developed 
using  the  three  methods  to  be  evaluated;  MLE,  OLS  Simple  B, 
and  OLS  standardized  (Beta) . Then  these  equations  were 
applied  to  the  seunple  test  file  to  compute  the  probability  of 
attrition  for  each  observation  in  the  following  manner:  P(A)= 
25oefficient*F (observed  value).  In  this  application  the 
function  of  the  observed  value  concept  was  used  because  of  the 
diversity  of  the  three  models,  in  which  the  B model  uses  raw 
values,  the  Beta  model  uses  standardized  variates  and  the  MLE 
model  uses  deviation  from  the  mean. 

A cut  score  was  determined  which  correctly  identified  the 
observed  nxamber  of  attr iters  in  the  sample  test  file.  For 
exaunple  in  the  Beta  Model  the  observations  which  have  a 
computed  probability  of  greater  than  .64  might  account  for 
the  Icnown  number  of  observations  which  were  attriters.  This 
cut  score  would  be  used  to  predict  the  1/0  (attrit/non-attrit) 
status  of  the  observations  in  the  cross  validation  file. 

The  predictor  equations  were  then  applied  to  the  cross 
validation  files.  In  each  of  the  three  live  data  cases, 
these  files  were  the  next  years  population  which  is  analogous 
to  the  real-world  problem  of  predicting  into  the  future  year. 
However,  historical  data  were  employed  for  cross  validation 
purposes  and  actual  attrition  results  were  Icnown  and  used  for 
analysis  of  prediction  efficiency.  Likewise  in  the  contrived 
data  case,  the  results  of  the  "future"  were  known  and  used  to 
measure  predictive  strength  of  the  various  models. 

In  all  cases  the  data  is  displayed  not  only  as  the  number  of 
predicted  attriters  but  also  in  the  classical  hits/false- 
positive/false-negative format  commonly  used  in  screening 
applications.  Additionally  the  expected  value  concept  was 
examined.  Expected  value  might  be  employed  in  a stochastic 
model  wherein  the  probability  of  attrition  is  computed  and 
compared  to  a uniform  random  number  to  make  the  1/0  deter- 
mination. For  this  reason,  the  expected  value  of  a cross 
validation  run  (the  sum  of  all  the  computed  probabilities) 
is  a pertinent  index  of  predictive  strength  in  the  arena  of 
officer  loss  rates. 
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PART  II.  RESULTS 


CASE  1 - CONTRIVED  DATA 

Correlation  matrices  are  revealing  tools  in  the  analysis  of 
data  and  are,  in  fact,  the  "guts"  of  OLS . As  can  be  seen 
in  the  table  below  the  correlation  matrix  for  the  scunple  test 
file  does  not  protend  for  a strong  predictor  model  and  the 
resulting  R-Squared  of  0.19  supported  this  conclusion. 

Intercorrelation  Matrix  for  Case  1 


Y 

Xa 

Y 

1.00 

.29 

.04 

.04 

0 

.29 

1.00 

-.57 

-.20 

.43 

xa 

.04 

-.57 

1.00 

-.04 

-.37 

X. 

.04 

-.20 

-.04 

1.00 

-.85 

X- 

0 

.43 

-.37 

-.85 

1.00 

Stepwise  regression  produced  the  following  results : 


Variable 

B Coefficient 

Beta  Coefficient 

F 

X, 

.00517 

.51460 

9.14191 

Xi 

.45993 

.50182 

5.00405 

X^ 

.46047 

.50241 

1.92134 

X^  .12701 

Constant  -.77401 

Multiple  R = 0.44001 

R-Squar ed  = 0.19360 

Standard  Error  = 0.43377 

.39195 

1.01117 

Albeit  that  the  last  two  variables  are  not  signigicant  at  .05 
by  application  of  the  F statistics , the  four  variable  model 
was  used  because  the  Beta  Coefficients  indicated  significant 
influence.  Application  of  the  B model  to  the  sample  test 
file  produced  a cutting  score  of  0.33,  i.e.  classifying  all 
observations  with  a computed  probability  of  0.33  as  "1"  or 
"on"  resulted  in  the  correct  (observed) number  of  "ons".  As 
indicated  above  the  prediction  equation  and  cutting  score 
were  then  applied  to  the  cross-validation  files.  Cross- 
validation  file  #1  was  designed  to  test  the  models  under 
extreme  positive  data  shifts.  To  meet  this  objective  the  mean 
of  Xi  was  increased  from  45  to  51,  and  the  mean  of  X4  was 
increased  from  3 to  6 , while  X2  and  X3  were  left  unchanged. 
Using  the  cut  score  produced  49  observations  identified  as 
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"on"  and  one  observation  labeled  "off".  The  expected  number 
of  "ons"  was  36.  Displayed  below  is  the  hits/false-positive/ 
false-negative  results  of  this  prediction; 


Predicted 


on 

Actual 

off 

Since  the  number  of  observed  records  which  were  "on"  was  15, 
the  tentative  conclusion  can  be  drawn  that  the  OLS  B model 
is  not  reliable  with  strong  positive  data  shift.  Next  the 
negative  data  shift  in  cross-validation  #2  was  processed 
with  the  B model. 


15 

0 



1 - 

In  this  case  the  mean  of  Xi  was  shifted  from  45  to  39,  X4 
from  3 to  1.6,  and  X2  and  X3  remained  unchanged.  Use  of  the 
cut  score  predicted  5 observations  to  be  "on".  The  expected 
number  of  "ons"  was  6.7.  Below  are  displayed  the  hits/false- 
positive/false-negative  data: 


Actual 


on 

off 


Predicted 
on  /<5ff 


5 

10 

Q 

Again  the  conclusion  can  be  drawn  that  the  B model  is  not 
viable  under  extreme  data  shifts. 

Beta  model  results  are  shown  below; 

Cross-validation  #1  - Positive  Data  Shift 


on 

Actual 

off 


Predicted 
on  off 


10 

5 



_2S 

Cut  score  prediction  =«  17 
Expected  value  =*  15 


Cross -val idation  #2  - Negative  Data  Shift 


Predicted 


on 

Actual 

off 

Cuts  score  prediction  = 16 
Expected  value  = 15 

MLE  results . follow: 

Cross-validation  #1  - Positive  Data  Shift 


on  off 


9 

6 

7 

28 

Predicted 


on 

Actual 

off 

Cut  score  prediction  = 17 
Expected  value  = 16 

Cross-validation  #2  - Negative  Data  Shift 


on 

Actual 

off 

Cut  score  prediction  = 11 
Expected  value  = 14 


Case  1 summary  is  shown  below  for 

Cross-validation  #1  - Positive 

the  three  models 

Data  Shift 

evaluated 

B 

BETA 

MLE 

% Hits 

32 

76 

76 

% "0ns"  = on 

30 

59 

59 

% "Offs"  = off 

100 

85 

85 

% False-positive 

68 

14 

14 

% False-negative 

0 

10 

10 

Cut  score  prediction 

49 

17 

17 

Expected  value 

36 

15 

16 

Observed  "on"  15 


Predicted 
on  off 


9 

6 

__2 



on  off 


10 

5 

7 

_2S 

( 


t. 


7 


Cross-validation 

#2  - 

Negative  Data  Shift 

B_ 

BETA 

MLE 

% Hits 

80 

74 

84 

% "0ns"  = on 

100 

’'2 

85 

% "Offs"  = off 

78 

56 

82 

% False-positive 

0 

14 

4 

% False-Negative 

20 

12 

12 

Cut  score  prediction 

5 

16 

11 

Expected  value 

6.7 

15 

14 

Observed  "on"  15 


CASE  2 - LINE  COLONELS'  RETIREMENTS,  '75  FROM  '74 

In  this  case  a "weak"  predictor  model  was  designed  as  is 
indicated  by  the  validity  vector  below  and  the  resultant 
R-Squared  of  . 16 ; 

Validity  Vector  for  Case  2 


Variable 
Tenure  (TAFCS) 

Age; 

Old  OER  Mean; 

No.  permanent  passovers; 
Last  OER; 

Below  Zone  Selection; 


0.37613 

0.29107 

-0.17088 

0.11924 

-0.07066 

-0.11534 


Stepwise  regression  produced  the  following  results; 


Variable  B 

Coefficient 

Beta  Coefficient 

F 

Tenure  (TAFCS) 

.648'6]“ 

.3i33() 

215. 55§ 

# Permanent  Pass 

.05433 

.07597 

20.004 

Age 

.00728 

( .05772 

6.892 

Old  OER  Mean 

-.00453 

-.02754 

1.954 

Last  OER 

-.00955 

-.01005 

.317 

Below  Zone  Selection 

.00429 

.00330 

.037 

Constant 

-.94884 

Multiple  R = .39049 
R-Squared  = .15248 
Standard  Error  = .39870 

Using  the  B model  and  Beta  model  on  the  cross-validation  file, 
the  1975  Line  Colonels,  produced  prediction  results  as  follows; 
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OLS  Predicted  1975  Line  Colonels'  Retirements 


B Model 


Beta  Model 


Predicted 


1 

Actual 

0 


1*  0 
•7  T 1 An 


3'77 

409 

■ .2U_ 

243  a- 

Predicted 


1 

Actual 

0 


1 0 


434 

352 

2293 

Cut  score  prediction  = 588 
Expected  value  = 697 
Observed  = 786 


774 

857 


* 1 = Retire,  0 = Stay 


The  MLE  results  are  shown  below: 


MLE  Predicted  1975  Line  Colonels'  Retirements 
Predicted 


1 

Actual 

0 

Cut  score  prediction  = 807 
Expected  value  = 859 

Observed  = 786 


1 g 


446 

340 

3$].  ■ 

2272- 

Case  2 summary  results  are  as  follows: 


Cross-validation  - 1975  Line  Colonels'  Retirements 


B 

BETA 

MLE 

% Hits 

82 

79 

79 

% "Retirees"  that  retired 

64 

56 

55 

% "Stayers"  that  stayed 

86 

87 

87 

% False-positive 

6 

10 

11 

% False-negative 

12 

10 

10 

Cut  score  prediction 

588 

774 

807 

Expected  value 

697 

857 

859 

Observed  Retirees 

786 

CASE  3 - LINE  COLONELS'  RETIREMENT, 

'76  FROM  '75 

As  outlined  earlier,  extensive  analysis  and  data  transformation 
were  undertaken  to  build  a "strong"  predictor  model.  The 
resultant  validity  vector  is  shown  below.  An  R-Sruared  of 
0.26  was  achieved. 
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Validity  Vector  for  Case  3 


Tenure  (TAFCS) ; 

.49036 

No.  of  Dependents; 

.20485 

Source  of  Commission; 

.22527 

Permanent  Grade: 

'.27404 

DAFSC  (first  two) : 

.08344 

Old  OER  Mean: 

.19063 

Stepwise  regression  results  are  displayed  below: 


Variable 

B Coefficient 

Beta  Coefficient 

F 

Tenure  (TAFCS) 

.00952 

.46560 

566.528 

No.  Of  Dependents 

.00349 

.07148 

21.129 

Source  of  Commission 

.00290 

.06529 

16.294 

Permanent  Grade 

-.00206 

-.05677 

8.950 

DAFSC  (first  two) 

.00518 

.04323 

8.468 

Old  OER  Mean 

.00226 

.04306 

7.701 

Application  of  the  B model  and  the  Beta  model  to  the  1976 
Colonels  file  produced  the  following  prediction  results; 

OLS  Predicted  1976  Line  Colonels'  Retirements 


• B Model 
Predicted 


1 

Actual 

0 


1 0 


276 

397 

Lzli— 

25.28  ■ 

Beta  Model 
Predicted 


1 

Actual 

0 


1 g 


341 

332 

2422- 

Cut  score  prediction  = 501  678 
Expeqted  value  =623  690 
Observed  = 673 


The  MLE  prediction,  consistent  with  case  one  and  two,  varies 
only  slightly  from  the  Beta  prediction. 


MLE  Predicted  1976  Line  Colonels'  Retirements 
Predicted 


1 

Actual 

0 

Cut  score  prediction  = 676 
Expected  value  = 691 

Observed  = 673 


1 g 


343 

342 

-222 

2414 
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Case  3 summary  results  are  as  follows 


Cross-validation  - 1976  Line  Colonels'  Retirements 


B 

BETA 

MLE 

% Hits 

SI 

80 

% "Retirees"  that  retired 

55 

50 

51 

% "Stayers"  That  stayed 

86 

88 

88 

% False-positive 

7 

10 

10 

% False-negative 

12 

10 

10 

Cut  score  prediction 

501 

678 

676 

Expected  value 

623 

690 

691 

Observed  Retirees 

673 

In  the  interest  of  brevity,  only  the  detail  and  summary 
results  will  be  displayed  for  Case  4 . The  methodology 
followed  has  been  sufficiently  detailed  in  Cases  1 thru  3. 

OLS  Predicted  1976  4th  Year  Group  Non-rated  Line  Separations 

B M<y3el  Beta  Model 

Predicted  Predicted 

1 0 

1 

Actual 

0 

Cut  score  prediction  = 1405 
Expected  value  = 1437 

Observed  = 1557 

MLE  Predicted  1964  4th  Year  Group  Non-rated  Line  Separations 


Actual 


1193 

364 

■■212 

2020 

1 0 


1230 

327 

-312 

.1315. 

1547 

1674 


Predicted 


1 

Actual 

0 

Cut  score  prediction  = 1500 
Expected  value  * 1730 

Observed  * 1557 


1 0 


1207 

350 

301 

1931 

11 


Cross-validation  - 1976  4th  Year 

Group 

Non-rated  Line 

Separations 

B 

BETA 

MLE 

% Hits 

85 

83 

83 

% "Separatees"  that  separated 

85 

80 

80 

% "Stayers"  that  stayed 

85 

85 

85 

% False-positive 

6 

8 

8 

% False-negative 

10 

9 

9 

Cut  score  prediction 

1405 

1547 

1508 

Expected  value 

1437 

1674 

1730 

Observed  Separations  1557 


PART  III.  SUMMARY 


Based  on  the  empirical  data  presented,  the  conclusion  must 
be  drawn  that  the  usual  B model  application  of  OLS  is  not 
a viable  solution  to  the  loss  rate  prediction  problem.  (And 
by  this  is  meant  the  prediction  of  the  correct  number  without 
regard  to  the  correct  classification) . However,  either  the 
OLS  Beta  model  or  MLE  model  provide  highly  reliable  predictions 
of  the  correct  count.  To  reinforce  this  point  the  predicted 
count  for  the  three  models  is  shown  below  for  the  five  cross- 
validation  cases. 


Cut  Score  Predictions 


Observed 

B 

Beta 

MLE 

Case 

1 

- 

Positive  data 

shift 

15 

49 

17 

17 

Case 

1 

- 

Negative  data 

shift 

15 

5 

16 

11 

Case 

2 

- 

1975  Colonels' 

Retirement 

786 

588 

774 

807 

Case 

3 

- 

1976  Colonels’ 

Retirement 

673 

501 

678 

676 

Case 

4 

— 

1976  4th  Yr  Gp 

Separation 

1557 

1405 

1547 

1508 

Expected 

Value  Predictions 

Observed 

B 

Beta 

MLE 

Case 

1 

- 

Positive  data 

shift 

15 

35 

15 

16 

Case 

2 

- 

Negative  data 

shift 

15 

7 

15 

14 

Case 

2 

- 

1975  Colonels' 

Retirement 

786 

697 

857 

859 

Case 

3 

- 

1976  Colonels' 

Retirement 

673 

623 

690 

691 

Case 

4 

- 

1976  4th  Yr  Gp 

Separation 

1557 

1437 

1674 

1730 

Further  exaunination  of  the  data  shown  above  leads  to  the 
conclusion  that  the  cut  score  predictions  of  the  Beta  and 
MLE  models  are  consistently  more  reliable  than  the  expected, 
value  predictions.  This  suggests  that  it  is  more  difficult 
to  correctly  map  the  individual  probabilities  into  the  future 
than  it  is  to  map  a 1/0  criterion  such  as  is  done  with  a cut 
score  applied  to  the  estimates  of  the  probabilities. 

Significance  tests  at  .05  were  applied  to  Cases  2 and  3,  the 
Line  Colonel  Retirement  data.  The  conclusion  was  drawn  that 
the  B model  predictions  were  not  equivalent  to  the  observed 
values  while  both  the  Beta  and  MLE  predictions  were  equivalent 
to  the  observed  values.  Further  the  predictions  of  the  Beta 
and  MLE  models  were  found  to  be  equivalent  at  .05. • Thus  the 
selection  of  a model  reduces  to  one  of  economics. 

Lockman  and  Warner  (#  3)  noted  that  MLE  will  require  more 
resources  than  OLS.  In  this  study,  the  MLE  prograun  employed 
required  from  2.3  to  5 times  the  computer  processor  time 
required  by  OLS . It  should  be  further  noted  that  the  cases 
analyzed  were  small,  compared  to  typical  real-world  problems, 
and  that  MLE  tends  toward  exponentially  increasing  resource 


* 
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consumption  with  increase  in  the  number  of  independent  variables. 
Thus  the  analyst  is  left  on  the  horns  of  a dilemma. .. .MLE  is 
theoretically  more  accurate  but  at  the  S2une  time  may  be 
significantly  more  costly  to  use.  The  OLS  Beta  model  will 
always  cost  less  than  MLE  but  may  not  be  reliably  accurate 
over  the  spectrum  of  problems  which  must  be  solved. 


PART  IV.  CONCLUSION 


Loss  rate  generation  can  be  dichotomized  into  two  classes... 
first,  the  problem  of  predicting  the  correct  number  for 
applications  such  as  force  structure  modeling,  and  second 
the  problem  of  classification  which  finds  application  in 
screening  for  selective  admission.  The  empirical  evidence 
presented  for  a diversity  of  data  in  this  study  supports 
the  use  of  the  Beta  model  for  the  prediction  of  the  number 
of  losses.  Cut  score  predictions  for  the  Beta  model  in  all 
cases  were  equally  as  good  as  the  MLE  predictions.  In  addition 
the  Beta  model  can  be  used  with  only  marginal  increase  in  costs 
ovei:  the  B model  (Beta  requires  the  Mean  and  Standard  Deviation 
of  the  independent  data  in  the  observations  to  be  predicted) 
while  the  MLE  model  incurs  greater  resource  costs.  There  is, 
however,  one  serious  shortcoming  of  the  Beta  model,  also 
shared  by  the  MLE  model.  Neither  of  the  techniques  consistently 
provided  reasonable  Expected  Value  predictions.  Thus,  until 
further  research  unravels  the  Expected  Value  anomalies,  the 
recommendation  is  made  that  the  Beta  model  using  cut  score 
for  the  1/0  criterion  be  used  for  forecasting  loss  rates. 
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PART  V.  ET  CETERA 


Several  loose  ends,  questions,  and  conjectures  are  left 
which  did  not  appropriately  fit  in  the  above  discussion. 

This  section  will  present  those  remaining  elements  for 
deliberation  and  possible  resolution. 

First  and  foremost  is  the  contention  in  the  literature  that 
the  OLS  B Model  and  Beta  Model  are  mathematically  equivalent. 
They  are,  indedd,  equivalent  when  applied  to  ..the  sample  test 
file  and  when  applied  in  the  typical  manner  of  crossWalidation 
(see  discussion  below) . However,*  the  cases  examined  in  this 
study  clearly  indicate  that  the  two  models  are  not  equivalent 
when  applied  to  real-world,  or  controlled  data,  problems. 

As  is  suggested  by  Fast  and  Dempsey  (#4  ) , the  OLS  B model 
perishes  under  the  influence  of  data  shifts.  The  Beta  model 
does  not.  Therefore  the  two  models  are  not  equivalent  in 
application,  or  when  cross-validated  in  the  loss  forecasting 
arena . 

Cross-validation  as  documented,  taught,  and  practiced  has  no 
relevance  to  loss  forecasting.  Typically  the  researcher  will 
randomnly  divide  a data  file  into  two  statistically  equivalent 
halves,  build  a predictor  model  with  one  half,  and  cross- 
validated  into  the  other  half.  In  practice,  this  technique 
will  always  cross-validate  if  the  researcher  is  successful 
in  producing  statistically  equivalent  files . The  additional 
step  of  constructing  independent  predictor  models  in  each 
file  and  cross -validating  into  the  other  file  contributes 
nothing  to  loss  forecasting.  It  is  also  the  incorrect  approach. 
In  loss  forecasting,  the  objective  is  to  build  a predictor 
model  with  an  historical  population  which  predicts  the  losses 
for  a population  in  the  future  which  usually  will  not  be 
statistically  tequivalent  to  the  historical  population  due  to 
data  shifts.  To  accomplish  this,  the  researcher  should  employ 
an  analog  of  the  process  to  be  modeled  by  using  two  previous 
years  of  data.  One  to  build  the  predictor  model  and  the 
subsequent  year  to  cross-validate  the  predictor  mbdel.  Only 
in  this  way  can  any  conclusions  be  drawn  regarding  the 
efficiency  of  models  for  predicting  losses. 

Predicting  losses  over  time  leads  to  the  concomitant  shift  of 
data  over  time.  Any  model  nominated  for  this  task  must  be  able 
to  accommodate  shifts  in  data  values . Consideration  of  this 
requirement  resulted  in  the  conjecture  that  the  three  models 
evaluated  are  equal  if  there  is  no  shift  of  data  in  the  cross- 
validation  file,  as  is  the  case  with  statistically  equivalent 
files . 
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Is  the  Beta  model  equivalent  to  the  MLE  model?  Or  even  more 
accurate?  Isolating  the  question  to  the  five  cases  presented 
would  result  in  a stirong  affirmative  to  the  first  question  and 
a "perhaps"  to  the  second.  However,  such  is  not  the  case. 

Without  a doubt,  cases  could  be  contrived,  particularly  bivariate 
cases,  in  which  the  Beta  model  would  be  inferior  to  the  MLE 
model.  This  is  an  important  consideration  because  loss  rates 
will  usually  present  a multivariate  problem.  With  this  type  of 
problem,  and  more  particularly  with  data  which  provide  low 
R-Squares,  it  is  conjectured  that  the  sigmoid  curve  produced 
by  MLE  will  be  essentially,  flat  and  thus  the  equivalent  of  the 
Beta  model; 

Finally,  the  classification  question.  In  officer  loss  rate 
generation,  correct  classification  is  not  germane;  however, 
the  serendipity  which  is  evident  should  not  be  ignored,  for 
the  Air  Force  personnel  managers  are  responsible  for  accession 
actions  in  which  classification  is  essential.  Reference  to 
the  s\immary  data  shown  on  pages  7-12,  above,  reveals  that  the 
Beta  and  MLE  model  classify  with  approximately  the  same 
accuracy.  One  other  form  of  the  Beta  model  was  investigated 
as  a result  of  informal  correspondence  with  Dr.  Joe  Ward  at 
the  Air  Force  Human  Resources  Laboratory.  Ward  suggested 
that  a more  accurate  Beta  model  could  be  derived  by  using 
the  validity  vector  from  the  saimple  test  file  and  the  correlation 
matrix  from  the  cross-validation  file.  This  is  in  the  fbrm 
of  ] = [R2]“^  [Vil.  Such  a prediction  equation  was  built 
and  tested  in  Case  4,  the  4th  Year  Group  Non-rated  Line 
Separations.  For  comparison  purposes  the  summary  table  is 
presented.  i 


Cross-validation  #4  - 4th 

Year  Group 

1976  Non- 

rated  Line 

Sep 

B 

Beta 

MLE 

WARD 

% Hits 

85 

83 

83 

85 

% "Separatees"  that  separated  85 

80 

80 

86 

% "Stayers"  that  stayed 

85 

85 

85 

85 

% False-positive 

6 

8 

8 

5 

% False-negative 

lo 

9 

9 

9 

Cut  score  prediction 

1405 

1547 

1508 

1396 

Expected  value 

1437  . 

1674 

1673 

1673 

Observed  Separations 

1557 

Although  the  Ward  method 

resulted  in 

the  worst 

cut  score 

prediction 

(90%  of  actual  separations) , the  accuracy  on  classification  is 
the  best  of  the  four  methods . This  suggests  an  area  for  further 
research  for  those  who  have  an  interest  in  screening  tools . 
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